Design of Efficient Floating-Point Convolution Module for Embedded System

نویسندگان

چکیده

The convolutional neural network (CNN) has made great success in many fields, and is gradually being applied edge-computing systems. Taking the limited budget of resources systems into consideration, implementation CNNs on embedded devices preferred. However, accompanying increasingly complex huge cost memory, which constrains its devices. In this paper, we propose an efficient, pipelined convolution module based a Brain Floating-Point (BF16) to solve problem, composed quantization unit, serial-to-matrix conversion operation unit. mean error BF16 only 0.1538%, hardly affects CNN inference. Additionally, when synthesized at 400 MHz, area 21.23% 18.54% smaller than that INT16 FP16 modules, respectively. Furthermore, our using TSMC 90 nm library can run 1 GHz by optimizing critical path. Finally, was implemented Xilinx PYNQ-Z2 board evaluate performance. experimental results show frequency 100 is, separately, 783.94 times 579.35 faster Cortex-M4 with FPU Hummingbird E203, while maintaining extremely low rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient 16-bit floating point interval processor for embedded applications

In the last ten years, interval techniques [1, 2] have allowed original solutions for many problems in engineering to be proposed, see, e.g., [3]. One of the main features of interval techniques is their ability to provide guaranteed results, i.e., with a verified accuracy or which are numerically proved. Consider for example, a bounded-error parameter estimation problem: the value of some para...

متن کامل

Design of Efficient Embedded System

Recently, a fully-integrated embedded system becomes popular for portable devices due to the cost reduction and the low-power operation. To reduce the size of the imbedded program, some hardware techniques such as nop-beforeexecution and programmable-delay-slot are introduced in CoreA embedded processor [1]. However, the previous designs using Core-A processor are not user-friendly due to the a...

متن کامل

simulation and design of electronic processing circuit for restaurants e-procurement system

the poor orientation of the restaurants toward the information technology has yet many unsolved issues in regards to the customers. one of these problems which lead the appeal list of later, and have a negative impact on the prestige of the restaurant is the case when the later does not respond on time to the customers’ needs, and which causes their dissatisfaction. this issue is really sensiti...

15 صفحه اول

Advanced Pipelined Area and Speed Efficient Floating-Point ALU Embedded System in FPGA

This paper introduces a technique to style and develop a completely pipelined and optimized design for Floating Point embedded processor in FPGA exploitation IEEE 754 format. The Floating purpose embedded processor performs many operations such as FP-Arithmetic, FP-Logical, FP-Trigonometric, FP-Vector, FP-Complex, FP-Signed, FP-Unsigned. In an exceedingly existing system, a fixed point illustra...

متن کامل

Design and Implementation of Efficient Adder based Floating Point Multiplier

In this paper a new idea is proposed to increase the speed of single precision floating point multiplier. In floating point multiplication adders are used at different places. The implementation uses efficient adders for compressing the partial products, adding the exponent and at final stage. First different adders are compared based on the delay and then multiplier is designed using the best ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2021

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics10040467